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Abstract 

A fundamental problem in neuroscience is understanding how working memory - the 
ability to store information at intermediate timescales, like 10s of seconds - is implemented 
in realistic neuronal networks. The most likely candidate mechanism is the attractor network, 
and a great deal of effort has gone toward investigating it theoretically. Yet, despite almost a 
quarter century of intense work, attractor networks are not fully understood. In particular, 
there are still two unanswered questions. First, how is it that attractor networks exhibit 
irregular firing, as is observed experimentally during working memory tasks? And second, 
how many memories can be stored under biologically realistic conditions? Here we answer 
both questions by studying an attractor neural network in which inhibition and excitation 
balance each other. Using mean field analysis, we derive a three-variable description of 
attractor networks. From this description it follows that irregular firing can exist only if the 
number of neurons involved in a memory is large. The same mean field analysis also shows 
that the number of memories that can be stored in a network scales with the number of 
excitatory connections, a result that has been suggested for simple models but never shown 
for realistic ones. Both of these predictions are verified using simulations with large networks 
of spiking neurons. 

A critical component of any cognitive system is working memory - a mechanism for storing 
information about past events, and for accessing that information at later times. Without such a 
mechanism even simple tasks, like deciding whether to wear a heavy jacket or a light sweater after 
hearing the weather report, would be impossible. Although it is not known exactly how storage 
and retrieval of information is implemented in neural systems, a very natural way is through 
attractor networks. In such networks, transient events in the world trigger stable patterns of 
activity in the brain, so by looking at the pattern of activity at the current time, other areas in 
the brain can know something about what happened in the past. 

There is now considerable experimental evidence for attractor networks in areas such as 
inferior temporal cortex [1-3], prefrontal cortex [4-9], and hippocampus [10,11]. And from a 
theoretical standpoint, it is well understood how attractor networks could be implemented in 
neuronal networks, at least in principle. Essentially, all that is needed is an increase in the 
connection strength among subpopulations of neurons. If the increase is sufficiently large, then 
each sub-population can be active without input, and thus "remember" events that happened 
in the past. 
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While the basic theory of attractor networks has been known for some time [12-14], moving 
past the "in principle" qualifier, and understanding how attractors could be implemented in 
realistic, spiking networks, has been difficult. This is because the original Hopfield model violated 
several important principles: neurons did not obey Dale's law; when a memory was activated 
neurons fired near saturation, much higher than is observed experimentally in working memory 
tasks [1, 15]; and there was no null background state - no state in which all neurons fired at low 
rates. 

Most of these problems have been solved. The first, that Dale's law was violated, was 
solved by "clipping" synaptic weights; that is, by using the Hopfield prescription [12], assigning 
neurons to be either excitatory or inhibitory, and then setting any weights of the wrong sign to 
zero [16, 17]. The second, building a Hopfield type network with low firing rate, was solved by 
adding appropriate inhibition [18-23] (importantly, this was a nontrivial fix; for discussion, see 
[23]). The third problem, no null background, was solved either by making the units sufficiently 
stochastic [18-21] or adding external input [14,20-23]. 

In spite of these advancements, there are still two fundamental open questions. One is: 
how can we understand the highly irregular firing that is observed experimentally in working 
memory tasks [24]? Answering this question is important because irregular firing is thought 
to play a critical role both in how fast computations are carried out [25] and in the ability of 
networks to perform statistical inference [26]. Answering it is hard, though, because, as pointed 
out in [27], with naive scaling the net synaptic drive to the foreground neurons (the neurons 
that fire at elevated rate during memory) is proportional to the number of connections per 
neuron. Consequently, because of the high connectivity observed in cortex, the mean synaptic 
drive is much larger than the fluctuations, which implies that the foreground neurons should fire 
regularly. Moreover, as pointed out by Renart et al., [28], even for models that move beyond 
the naive scaling and produce irregularly firing neurons, the foreground neuron still tend to fire 
more regularly than the background neurons, something that is inconsistent with experiments 
[24]. 

Several studies have attempted to get around this problem, either directly or indirectly 
[22,27-29]. Most of them, however did not investigate the scaling of the network parameters 
with its size (i.e., with the number of neurons and connections). So, although parameters were 
found which led to irregular activity, it was not clear how those parameters should scale as the 



4 



size of the network increased to realistic values. In the two that did investigate scaling [27, 28], 
irregular firing was possible only if a small fraction of neurons was involved in each memory; 
i.e., only if the coding level was very small. Although there have been no direct measurements 
of the coding level during persistent activity, at least to our knowledge, experiments in superior 
temporal sulcus [30] suggest that it is much larger than the one used these models. We should 
point out, though, that the model of Renart et al. [28] is the only one in which the foreground 
neurons are at least as regular as the background neurons. 

The second open question is: what is the storage capacity of realistic attractor networks? 
That is, how many different memories can be stored in a single network? Answering this is critical 
for understanding the highly flexible and seemingly unbounded memory capacity observed in 
animals. For simple, albeit unrealistic, models the answer is known: as shown in the seminal 
work of Amit, Gutfreund and Sompolinsky [31], the number of memories that can be stored in 
a classical Hopfield network [12] is about 0.14 times the number of neurons. For slightly more 
realistic networks the answer is also known [16,19,21,27,32-38]. However, even these more 
realistic studies lacked biological plausibility in at least one way: connectivity was all-all rather 
than sparse [19, 21, 33, 38], the neurons were binary (either on or off, with nothing in between) 
[16,19,21,32,33,37], there was no null background [16,32,33,35,37,38], the firing rate in the 
foreground state was higher than is observed experimentally [16, 27, 32, 33, 36, 37], or the coding 
level was very small [27, 36]. 

Here we answer both questions: we show, for realistic networks of spiking neurons, how ir- 
regular firing can be achieved, and we compute the storage capacity. Our analysis uses relatively 
standard mean field techniques, and requires only one assumption: neurons in the network fire 
asynchronously. Given this assumption, we first show that neurons fire irregularly only if the 
coding level is above some threshold, although a feature of our model is that the foreground neu- 
rons are slightly more regular than the background neurons. We then show that the maximum 
number of memories in our network - the capacity - is proportional to the number of connec- 
tions per neuron, a result that is consistent with the simplified models discussed above. These 
predictions are verified with simulations of biologically plausible networks of spiking neurons. 
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1 Model 

To address analytically the issues of irregularity and storage capacity in attractor networks, 
we consider a model in which neurons are described by their firing rates. Although firing rate 
models typically provide a fairly accurate description of network behaviour when the neurons are 
firing asynchronously [39,40], they do not capture all features of realistic networks. Therefore, 
we verify all of our predictions with large-scale simulations of spiking neurons. 

Our network consists of two populations, one excitatory and one inhibitory, with Ne neurons 
in the former and Nj in the latter. (In general we use E for excitation and I for inhibition.) We 
represent the firing rate of the i th neuron in pool Q(= E, I) by vqi. As we show in Appendix I, 
and discuss below, the time evolution equations for the firing rates are given by 



T E = F E (tlEi) ~ V Bi 

dUli TP th \ 



(la) 
(lb) 



where te and 77 are the excitatory and inhibitory time constants, Kqi is the synaptic input to 
the i th neuron in pool Q, and Fq{K) is a function that tells us the steady state firing rate of 
a neuron receiving synaptic input h. This function, which has a relatively stereotyped quasi- 
sigmoidal shape, can be determined analytically (or semi-analytically) for specific noise models 
[41-43], and numerically for more realistic models [40]. The synaptic drive, h,Qi, is related to 
the activity of the presynaptic neurons via 

N E JVj 

where J® R is the synaptic weight from the j th neuron in pool R to the i th neuron in pool Q and 
h,Q ex is the external, purely excitatory, input to neurons in pool Q. Finally, the steady-state 
firing rate of each neuron is determined by setting dv E i/dt and dv^jdt to zero, yielding the 
equation 



VQi = F Q (h Q i) . 



(3) 
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The bulk of our analysis focuses on solving Eq. (3); we use the dynamics, Eq. (1), only when 
investigating stability. Our goal is to determine the conditions that support retrieval states - 
states such that subpopulations of neurons have elevated firing rates. 

Since the gain functions, Fq(K), that we use in Eq. (1) play such a central role in our 
analysis, we briefly justify them here; for additional details, see Appendix I. These gain functions 
come from an average over the fast temporal fluctuations of the synaptic input - basically, 
filtered spikes. Calculating the temporal fluctuations self-consistently is a hard problem [44] but, 
fortunately, it's not a problem we have to solve. As we show in Appendix I, in the limit that 
each neuron receives a large number of connections, the temporal fluctuations experienced by all 
the excitatory neurons have the same statistics, as do the temporal fluctuations experienced by 
all the inhibitory neurons. Thus, we can use a single function, Fe{H), for the excitatory neurons, 
and another function, Fj(h), for the inhibitory ones. Of course, we won't be able to calculate 
the shape of Fq without knowing the structure of the temporal fluctuations. However, as we 
show below, the precise shapes of the gain functions don't play a strong role in our analysis. 

1.1 Connectivity 

The main determinant of network behaviour, at least in this model, is the set of connection 
strengths, the jQ R (as we will see, single neuron properties affect the quantitative details, but 
do not play a strong role in the qualitative behaviour). To choose connection strengths that 
will lead to attractors, we build on the model proposed by Hopfield over two decades ago [12]. 
In that model, random patterns are stored via a Hebbian learning rule, so connection strengths 
among neurons have the form 



where Aij is the strength of the connection from neuron j to neuron i, = 1 if neuron i 
participates in pattern fi and = otherwise, (3 is a constant that determines the memory 
strength, and p is the number of patterns. For each neuron, the probability of participating in 
a given pattern, fx, is equal to the coding level, which we denote a. Thus, 



p 




(4) 




1 with probability a 



(5) 



= < 



with probability 1 — a . 
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With this definition, the term — a) in Eq. (4) ensures that, on average, J2j ^ij 1S zero. Thus, 
the learning rule does not change the total synaptic weight onto a neuron, a form of postsynaptic 
normalisation that has been observed experimentally in cultured networks [45,46]. 

While Eq. (4) produces a network that exhibits attractors, it is inconsistent with biology 
in at least two important ways. First, the neurons can exhibit both excitatory and inhibitory 
connections (for fixed presynaptic neuron j, Ay can be positive for some postsynaptic targets i 
and negative for others), which violates Dale's law. Second, connectivity is all to all, which is 
inconsistent with the sparse connectivity seen in cortex [47]. Both can be fixed by introducing 
sparse, random background connectivity among excitatory and inhibitory neurons, and adding 
a threshold so that neurons are either excitatory or inhibitory, but not both. This yields a set 
of connection strengths of the form 



jEE 

j ij — 


c ij E [JEE + Aij 


JIE 

J ij — 


n lE T 

Cij JIE 


JEI 

J ij — 


n EI T 

Cij JEI 


J 11 ~ 


Cij J II , 



(6a) 
(6b) 
(6c) 
(6d) 



where the Jqr set the background connection strengths (with, of course, Jee and Jje positive 
and J ei and Jjj negative), [•]+ is the threshold-linear operator ([x] + = x if x > and 
otherwise), and c® R is the probability that neuron j of type R connects to neuron i of type Q. 
We assume that the connection probability is independent of type, so 



with probability c 

(7) 

with probability 1 — c . 




With this connectivity matrix, every neuron in the network projects to, on average, Ke excita- 
tory and Kj inhibitory neurons, and every neuron receives, on average, Ke excitatory and Kj 
inhibitory connections, where 
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K E = cN E 



(8a) 



Kj = cNj. 



(8b) 



The probability of connection, c, is assumed to be much smaller than 1, leading to a sparsely 
connected network [47], and it is independent of the size of the network unless otherwise stated. 
While we could have made the connectivity scheme more general by letting the connection 
probability between neurons depend on their type and/or by letting the nonzero c^- R in Eq. (7) 
have some variability, this would merely add complexity without changing any of our conclusions. 

Although we are including the threshold-linear operator in Eq. (6) (and also in the sim- 
ulations) we neglect it in the forthcoming theoretical analysis. This is because tends to 
be small: Its mean is zero and, as we discuss in section 2.5 and Appendix II, its variance is 
0(p/K E ). Thus, as long as p is sufficiently small compared to K, the threshold-linear operator 
can be neglected. For our model, we find that p/K is at most about 0.01, which means that the 
threshold-linear operator is unlikely to have much effect. Importantly, even if p/K were large, 
the scaling relation that we derive for storage capacity, i.e. p max ~ K, would still be correct; 
the only effect would be a slight modification to the precise value of p max /K [16]. 

2 Network equilibria 

As discussed above, much of our focus in this paper is on solving Eq. (3). For even moderate 
size networks, this corresponds to solving thousands of coupled, highly nonlinear equations, and 
for large networks that number can run into the millions. We do not, therefore, try to find a 
particular solution to this equation, but instead look for a statistical description - a description 
in terms of probability distributions over excitatory and inhibitory firing rates. The main tool 
we use is self-consistent signal-to-noise analysis [48, 49]. The idea behind this analysis is to treat 
the synaptic input (h E i and hn in Eq. (3)) as Gaussian random variables. Solving Eq. (3) then 
reduces to finding, self-consistently, their means and variances. 

Because h E i and hn consist of 2K (very weakly) correlated terms, where 



K = 



K E + Kj 
2 
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naive central limit arguments tell us that the standard deviations of these quantities should be 
smaller than their means by a factor of K 1 / 2 . It would seem, then, that in the kinds of high 
connectivity networks found in the brain, where K is on the order of 5,000-10,000, neuron to 
neuron fluctuations in firing rate would be small, on the order of K~ 1 / 2 . By the same reasoning, 
temporal fluctuations in the firing rates would also be small, again on the order of K~ 1 / 2 . Neither 
of these, however, are observed in biological networks: there are large fluctuations in firing rate 
both across neurons and over time [24, 50-53] . 

To resolve this apparent contradiction, one need only notice that \ie% and hn consist of both 
positive and negative terms (the first and third terms in Eq. (2) are positive; the second is 
negative). If these terms approximately cancel - to within 0(K~ x l 2 ) - then both the mean and 
standard deviation of the synaptic drive will be on the same order, and network irregularity 
will be restored. As showed by van Vreeswijk and Sompolinsky in a groundbreaking set of 
papers [25,54], under fairly mild conditions this cancellation occurs automatically, thus placing 
networks very naturally in what they called the balanced regime. In this regime, fluctuations 
across both neurons and time are large. Whether networks in the brain really operate in the 
balanced regime is not completely clear, although recent experimental evidence has come down 
strongly in favour of this hypothesis [55, 56]. 

While the work of van Vreeswijk and Sompolinsky was extremely important in shaping our 
understanding of realistic recurrent networks, their focus was primarily on random connectivity. 
The situation, however, is more complicated in attractor networks. That's because these net- 
works consist of three classes of neurons rather than two: background excitatory neurons and 
background inhibitory neurons, as found in randomly connected networks, but also foreground 
excitatory neurons. Our goal in the next several sections is to understand how all three classes 
can be balanced, and thus fire irregularly. 

2.1 Strong synapses and the balanced condition 

A reasonable constraint to place on our theoretical framework is that, in the large K limit, our 
results should be independent of K. This suggests that the synaptic strength, the Jqr in Eq. 
(6), should scale as K~ x l 2 . With this scaling, the mean value of the positive and negative terms 
in h,Ei and hu become 0{K x l 2 ), with cancellation these terms are 0(1), and the variance is 
also 0(1). Thus, if the gain functions, the Fq{K) in Eq. (3), are also 0(1), our results will be 
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independent of the number of connections. To make the K l l 2 scaling explicit, we define a new 
set of synaptic strengths and external input, which we denote Jqr and h,Q ex , respectively, 



~ K^J QR 

Jqr = — ( ^ 

h Qex = K l ' 2 h Qex (9b) 



where Jqr and h,Q ex are both 0(1) and, recall, Kr = cNr (Eq. (8)). 

Equation (9) tells us how to scale the background connectivity, but it does not directly 
apply to the part of the connection matrix associated with memories, A^. To determine how 
Aij should scale, we need only note that the mean contribution from the memories should be 
0(1) - sufficiently large to have an effect, but not so large as to overwhelm the background. 
Consequently, Ay should scale as 1/K (see Appendix II for details), which we can guarantee by 
defining a new variable, /3, via the relation 



Keci(1 — a) 

where is 0(1) and the factor a(l — a) is for convenience only. 
2.2 Mean field equations for the retrieval states 

Now that we have the "correct" scaling - scaling that makes our results independent of network 
size and ensures that the mean and variance of the synaptic input are both 0(1) - we can 
apply self-consistent signal-noise analysis to Eq. (3). The first step is to divide the excitatory 
and inhibitory synaptic currents (Kei and ha) into two pieces: one that is nearly independent 
of index, i (the "mean"), and one that is a random variable with respect to i (the fluctuating 
piece). To do that, we rewrite the synaptic current in terms of our new variables, Jqr and /3, 
rather than J9 R and 0. Combining Eqs. (4), (6), (9) and (10) with Eq. (2), we have 



(10) 



11 



h El = K 1 ' 2 
h A = K x l 2 



n r 



R R 3=1 



M =ij=i 

(lib) 



P Nfi 



R j=l 



Note that Eq. (11) is identical to Eq. (2); it is just expressed in different variables. 

For the terms in brackets, the mean and fluctuating pieces are easy to compute: the mean 
comes from replacing by its average, c, and the fluctuating piece comes from replacing 
c ij R by the residual, c^- R — c. For the second term in Eq. (11a), separating the mean from 
the fluctuating piece is harder, as there is a nontrivial dependence on i associated with the 
p memories. Ultimately, however, we are interested in the case in which only one memory is 
retrieved, so when computing the mean we can consider only one term in this sum on \x\ the 
other p — 1 terms contribute only to the fluctuations. Assuming, without loss of generality, 
that the first memory is retrieved, averaging over the randomness associated with the sparse 
connectivity allows us to replace cfj E with c, and we find that the mean of the last term in Eq. 
(11a) is proportional to 

Putting all this together, we arrive at the eminently reasonable result that the mean exci- 
tatory and inhibitory synaptic currents are linear in the mean excitatory and inhibitory firing 
rates, and the mean excitatory current has an extra, memory induced, dependence proportional 
to Dropping the superscript "1" (a step taken only to simplify the equations), we find that 
the synaptic current may be written 



h Ei = h E + ^i(3m + 5h Ei 
hn = h T + Shn, 



(12a) 
(12b) 



where h E and hi are the averages of the terms in brackets on the right hand side of Eq. (11), 
£i/3m is the mean contribution from the first memory, and bh E { and Sh^ contain everything else. 
More specifically, the terms in Eq. (12) are as follows. First, Ke and hj are given by 
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h E = K x l 2 {J EE v E + J m v I + h Eex ) 
hj = K 1/2 (J IE is E + J u v, + /i Je:r ) , 



(13a) 
(13b) 



where v E and Vi are the firing rates averaged over the excitatory and inhibitory populations, 
respectively, 

i 

Second, the overlap, m, which is proportional to the mean firing rate of the foreground neurons 
relative to v E , is given by 



m =N E atl-a)^- a) "»- (15) 

i 

Expressions for the fluctuating terms, Sh E i and Shn, are given in Eqs. (II-2) and (H-3). Because 
these terms contain everything not contained in the mean terms, Eq. (12) is exact. 

The three quantities v E , v t and m are our main order parameters. To determine their values 
self-consistently, we express the firing rates, v Ei and v H , in terms of the synaptic currents using 
Eq. (3), and insert those expressions back into Eqs. (14) and (15); that leads to 



v E = jrj- F E (h E + iifim + Shsi) (16a) 

i 

m = — — Y7& - a)F E (h E + tifim + Sh Ei ) (16b) 

iv Ed{ L — a) — * 

i 

Vi=Jj-Y, F i( h ' + Sh *)- ( 16c ) 
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To solve these equations, we use the fact that there are a large number of neurons; this 
allows us to turn the sum over i into an integral over the probability distributions of 5 He and 
Shi, denoted p{8Ke) and p(hj), respectively. Replacing the sum by an integral in Eq. (16), and 
also averaging over the mean field equations become 



u E = J dSh E p(5h E ) (F E (h E + ^(5m + 5h E ))^ 

m = [ dSh E p(5h E ) (— — ^F E (h E + £j3m + 6h E ) 
J \a(l-a) 

u I = J d5hj piShj) F^hj + Shi) , 



where the subscript on the angle brackets indicates an average with the statistics given in Eq. 
(5). 

Because both Sh E and Shi are Gaussian random variables (see Appendix I), these integrals 
are reasonably straightforward; what makes them at all difficult is that the variance of 5h E and 
5hj must be found self-consistently. This results in two more equations, for a total of five (see 
Eq. (II-8)). This is still far fewer than our original set of 1000s or more. And the situation gets 
even better: it turns out that we really only need to consider three, at least if all we want to do 
is gain qualitative insight into how attractor networks function. That's because the integrals are 
simply Gaussian convolutions, so all they do is smooth the gain functions. Using a bar to denote 
the smoothed functions, and performing the average over £ (which is straightforward because it 
has simple 0/1 statistics; see Eq. (5)) we have 



v E = (1 - a)F E {h E ) + aF E {h E + (5m) 
m = F E {h E + (5m) - F E {h E ) 
u I = F I (h I ). 



(18a) 
(18b) 
(18c) 



These equations - which are identical in form to the ones derived in [23] - are oversimplified 
versions of the full mean field equations. Basically, the bar over F hides a dependence on two 
additional order parameters - the second moments of the excitatory and inhibitory firing rates 
- which in turn depend on our main order parameters, v E , v u and m. While these dependencies 
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are important for making detailed predictions, for an intuitive picture of what the mean field 
equations mean they can be ignored. Consequently, in the next several sections we focus on Eqs. 
(18a-c), which we refer to as the reduced mean field equations. At the end of the next section, 
we argue that, under very general conditions, all the conclusions we draw based on the reduced 
mean field equations apply to the full set (which are given in Eq. (II-8)). 

2.3 Reduced mean field equations in the infinite K limit 

When solving the reduced mean field equations we have a choice: we can think of them as 
functions of v E , Vj and m, or as functions of h E , h r and m. Since v E and v t are related to h E and 
hi via an invertible transformation - Eq. (13) - the two prescriptions are identical. The latter, 
however, turns out to be more convenient, especially in the infinite K limit. To see why, we 
need only solve Eq. (13) for the mean firing rates, which yields 



v E = v B0 + K 1/2 D 1 [Jnh E - J E jhi} 
vi = vio + K'^D' 1 [J EE hi - Ji E h E \ 



(19a) 
(19b) 



where 



v eo — D [J E ihi ex Jnh Eex ] (20a) 
u I0 = D' 1 [J IE h Eex - J EE h Iex ] (20b) 



are the mean firing rates in the infinite K limit and 

D = J EE J a J E iJ i E (21) 

is the determinant of the background connectivity matrix; as shown in [54] and Appendix III, D 
must be positive for the background to be stable. Since we are in the balanced regime, He and 
hi are 0(1). Consequently, in the infinite K limit, the mean excitatory and inhibitory firing 
rates are simply given by v E0 and v I0 , respectively, independent of hE and hi. Using this fact, 
the reduced mean field equations, Eq. (18), become, in the K — ► oo limit, 
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u E0 = (1- a)F E (h E ) + aF E (h E + (3m) 
m = F E (h E + (3m) - F E {h E ) 
v I0 = F^hj) . 



(22a) 
(22b) 
(22c) 



An important feature of these equations is that /i 7 decouples from h E and m. This greatly 
simplifies the analysis, since it means we can find the equilibrium value of h l simply by inverting 
Fi. 

Our approach to finding the equilibrium values of h E and m is a graphical one: we plot, in 
h E -m space, the two curves that correspond to the solutions to Eqs. (22a) and (22b) - the h E 
and m nullclines, respectively - and look for their intersections. The goal is to determine the 
conditions under which there are multiple intersections, with at least one of them corresponding 
to an equilibrium with m > 0, and thus to a retrieval state. 

To be as general as possible, we make only two assumptions: F E (h) is monotonic increasing, 
and it is quasi-sigmoidal, where we use "quasi-sigmoidal" to mean convex (F E (h) > 0) for small 
h and concave {F' E (h) < 0) for large h. (Note that F E (h) need not saturate.) This immediately 
tells us something about the shape of the nullcline: since the right hand side of Eq. (22a) 
is an increasing function of both h E and m, its solution, h E (m), must have negative slope (i.e., 
dh E /dm < along the h E nullcline). Typical plots of the /i^-nullcline are shown in Fig. la 
for two values of the coding level, a. Note that the nullcline curves upward in this plot, a 
consequence of the fact that we use -h E rather than h E on the y-axis. 

To find the m-nullcline - the set of points in h E -m space that satisfy Eq. (22b) - we proceed 
in two stages. First, we plot the right hand side of Eq. (22b) versus m and look for intersections 
with the 45° line; these intersections correspond to points on the m-nullcline. Second, we vary 
h E and sweep out a curve in h E -m space; this curve is the full m-nullcline. A typical plot 
versus m with h E fixed is shown in in Fig. lb. There are three intersections with the 45° line, 
which means that the m-nullcline consists of three points at this particular value of h E : one 
with m = and two with m > 0. To find out how these three points move as we vary h E , we 
compute dm(h E ) / dh E where the derivative is taken along the m-nullcline; using Eq. (22b), this 
is given by 
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(a) (b) 




Figure 1: Generic shapes of the nullclines. Note that these are "cartoons," and thus do not 
apply to any particular model; for nullclines derived from a specific model, see Fig. 2. (a) 
/i B -nullcline versus m for two different value of a (a is small for the dashed curve and large for 
the solid curve). Note that we use -h E on the y-axis, so the upward curvature indicates that 
the total synaptic drive to a cell decreases with m. (b) Right hand side of Eq. (22b) versus 
m with h E fixed. The intersections with the 45° line correspond to points on the m-nullcline. 
(c) The m-nullcline. The precise shape isn't so important; what is important is that the part 
of the nullcline not on the m = axis has the topology of a circle. Insets indicate the portion 
of F E {h E ) that contributes to the m-nullcline; see text, (d) The m- and /i^-nullclines on the 
same plot. The intersections correspond to network equilibria. There are three equilibria: one 
at m = 0, corresponding to the background state, and two at m > 0, corresponding to potential 
retrieval states. The one at m = and the one at large m are stable; the intermediate one is 
not. Consequently, only the large m equilibrium is observed during retrieval. Note that when 
the coding level, a, is small (dashed blue line), the retrieval state occurs at large m, and thus 
high firing rate. Only when a is large (solid blue line) is it possible to have low firing rate during 
retrieval. 
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dm(hE) 



d[F E {h E + (5m) - F E {h E )]/dh E 



(23) 



dh E 



1 - d[F E (h E + /3m) - F E (h E )]/dm ' 



We are primarily interested in the sign of dm/dh E , which can be found by examining the signs 
of the numerator and denominator separately. For the denominator, note that the derivative of 
the term in square brackets is the slope of the curve in Fig. lb. Consequently, the denominator 
is negative for the intermediate intersection (where the slope is greater than 1) and positive 
for the upper intersection (where the slope is less than 1). The sign of the numerator depends 
primarily on the size of h E . If h E is small, so that both F E (h E + (5m) and F E (h E ) lie on the 
convex part of the sigmoid, then the numerator is positive. If, on the other hand, h E is large, 
so that F E (h E + (5m) and F E (h E ) lie on the concave part, then it is negative (see insets in Fig. 



This gives us the following picture: when h E is small, so that the numerator in Eq. (23) is 
positive, decreasing h E causes the two intersections in Fig. lb to move closer, and eventually 
annihilate. When h E is large, on the other hand, so that the numerator is negative, increasing, 
rather than decreasing, h E causes the intersections to move closer, and eventually annihilate, 
this time for sufficiently large h E . Filling in the points away from the extrema, we see that the 
m-nullcline is topologically equivalent to a circle (Fig. lc). Finally we note that the line m = 
is also part of the nullcline, as can easily be seen from Eq. (22b); this line is also included in 



In Fig. Id we combine the /iE-nullclines from Fig. la and the m-nullcline form Fig. lc. Clearly 
there is always an equilibrium at m = 0, corresponding to no active memories; i.e., corresponding 
to a null background. There are also two equilibria at m > 0, corresponding to active memories. 
In Appendix III we show that the one at larger m is stable. Importantly, this equilibrium can 
occur at small m, and thus low firing rate, something we will see more quantitatively in the next 
section, where we consider a specific example. Although not shown in Fig. 1, the m-nullcline 
can shift far enough up so that m can be negative at equilibrium. When this happens, m = 
becomes unstable, which in turn implies that the background becomes unstable. We see this in 
the simulations: when (5 becomes too large, memories are spontaneously activated. 

We can now see the critical role played by the coding level, a. In the limit a — > 0, the right 
hand side of Eq. (22a) becomes almost independent of m. This makes the /i^-nullcline almost 



lc). 



Fig. lc. 
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horizontal (dashed line in Fig. Id), so the only stable retrieval state occurs at large m, and 
thus high firing rate (the peak of the m-nullcline typically occurs near the maximum firing rate 
of the neurons, about 100 Hz; see next section). If, on the other hand, a is reasonably large, 
then the /i£?-nullcline can curve up and intersect the m-nullcine to the left of its highest point 
(solid blue line in Fig. Id). As just discussed, this intersection corresponds to the intermediate 
intersection in Fig. lb, which means it corresponds to low firing rate, and thus a biologically 
realistic retrieval state. 

We end this section by discussing the conditions under which the nullclines in Fig. Id, which 
were derived from Eq. (18), are the same as the nullclines for the full mean field equations, Eq. 
(II-8). The primary effect of the full set of equations is to couple hi to He and m. One could, 
however, solve for hi in terms of hE and m, insert that solution into the equations for hE and 
m, and derive a new coupled set of equations that again involve only Ke and m. This would, 
effectively, replace Fe in Eq. (18) with a more complicated function of hE and m. Examining 
Eqs. (II-8) and (II-9), we see that these manipulations would result in the following replacements, 

FE{h E ) -> F E (hE,aE(h E ,m)) 
Fe^e + (3-m) — »■ F E (hE + (3m, OE{h E , m)) . 

Retracing the steps that led us to Fig. Id, we see that if Fe{He, &E(hE, m)) and Fe(He + 
(3m, aE(hE, m)) are quasi-sigmoidal functions of He and m, we recover the nullclines in Fig. Id. 
Both of these conditions are likely to hold for real neurons, since increasing He and m correspond 
to increasing excitatory drive. Thus, for neurons with reasonable gain functions, we expect Fig. 
Id to fully capture the shape of the nullclines. 

2.4 An example: nullclines for a simple gain function 

As an illustrative example, we consider a specific form for the gain functions (the Fq), and 
compute the resulting nullclines numerically. The form we choose is a rather standard one, 



F Q (h) = VmsK K(h/<r Q ) (25) 

where f max is the maximum firing rate of both excitatory and inhibitory neurons, which without 
loss of generality we take to be 100 Hz, H(x) is given by 
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H(x) = 



1 



(26) 



1 + exp(— x) ' 



and 



erg is is an approximate standard deviation based on Eq. (H-5), 




R n 

Before computing the nullclines for these gain functions, we introduce a transformation that 
changes the nullclines without changing the equilibria. Combining Eqs. (22a) and (22b), we see 
that Eq. (22a) can be written 



Note that the right hand side of Eq. (27) is an increasing function of both He and m, so the 
/i^-nullcline based on Eq. (27) has the same qualitative shape as the /le-nuilcline based on Eq. 
(22a). This form is more useful than the one in Eq. (22a), however, because we can immediately 
write down an expression for ^(m), 



Computing the nullclines is now a straightforward numerical task, and in Figs. 2a-d we plot 
the m- nullclines (green) for increasing values of (3 and the He- nullclines (blue) for two different 
values of the coding level, a. Because the m-nullcline does not depend on a (see Eq. (22b)), 
there is only one m-nullcline in each panel. 

The first thing we notice is that when (3 is sufficiently small (Fig. 2a), the m-nullcline 
consists only of a line at m = 0, which means that the only possible equilibria are at m = 0, and 
so retrieval states are not possible. When [3 is slightly larger (Fig. 2b), the m-nullcline gains a 
second piece away from the line m = 0. However, this second piece lies below both /i^-nullclines, 
so the only intersections are again at m = 0, and retrieval is again not possible. The fact that 
there is no memory retrieval when (3 is small makes sense: (3 controls the connection strength 
among the neurons within each memory, so if it is too small there will not be enough recurrent 
connectivity to produce elevated firing. 

For still larger (3, there is an intersection with one of the h e- nullclines - the one corresponding 
to low coding level (Fig. 2c). The stable equilibrium, which is the equilibrium with larger m, 



= F E (h E ) + am . 



(27) 



hE(m) = F E 1 (i/ B0 - am). 



(28) 



20 



(a) 



ui 

£ iooh 



50 - 



-100 



o- 



-50 



-100 



a=0.05 



P=0.1 



-50 




m 

(c) 



50 



■100 



■50 




m 



50 



— i— 
100 



UJ 

£ 100- 

1 

50 - 

a=0.001 




a=0.05 


~°" 






p=0.5 


-50- 




-100 - 

— 1 1 


1 1- 



100 



(b) 



UI 

100- 

i 




a=0.05 


50 - 

a=0.001 








p=0.25 


-50 - 




-100 - 

i i 


i i 



■100 



■50 




m 



50 



100 




Figure 2: /i B -nullcline and m-nullcline for the gain function given in Eqs. (25) and (26). Different 
panels correspond to different values of /3, and in all of them two /i£-nullclines are shown: one 
with a = 0.001 (dashed blue line) and one with a = 0.05 (solid blue lines). The m-nullcline does 
not depend on a (Eq. (22b)). The parameters were J EE = J m = 1, J m = —1.9, J H = —1.5, h Eex = 
3,h Iex = 2.1, which implies, via Eqs. (20) and (21), that v E0 = 1.3 Hz. (a) = 0.1. The 
m-nullcline consists only of a line at m = 0, so there can be no memory retrieval states, (b) 
(3 = 0.25. The m-nullcline gains a second piece away from m = 0, but there are still no equilibria 
with nonzero m, and thus no retrieval states, (c) (3 = 0.5 The m-nullcline now intersects one 
of the /i£-nullclines - the one with small coding level, a. (d) = 1.2. There are now three 
intersections for both values of a. The ones with m = and large m are stable; the one with 
intermediate m is unstable (see Appendix III). The /i^-nullcline with a = 0.001 is essentially 
a straight line, so memory retrieval occurs at a firing rate that is too high to be biologically 
realistic. The /i^-nullcline with a = 0.05, on the other hand, has strong upward curvature, so 
memory retrieval occurs at a much lower, and thus biologically plausible, firing rate. 
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corresponds to memory retrieval (Appendix III). Finally, at sufficiently large (3, the system 
acquires an intersection with the /i£-nullcline corresponding to high coding level (Fig. 2d). 
Again, the stable equilibrium is the one with larger m. 

An important point is that the value of m at the retrieval state, and thus the firing rate 
of the foreground neurons, depends strongly on the coding level, a. For small a (dashed blue 
line), retrieval occurs near saturation, and thus at an unrealistically high firing rate. For larger 
a (solid blue line), the retrieval occurs at low firing rate, consistent with experiments (when 
a = 0.05 and j3 = 1.2, the equilibrium value of m is 20 Hz). This is exactly the behaviour we 
saw in the previous section. 

As can be expected from these figures, increasing j3 even further would shift the intermediate 
intersection to negative values of m. In this regime the background becomes unstable. Again 
this makes sense: if the coupling among the neurons within a memory is too strong, they 
become spontaneously active. Examining Fig. lb, we see see that this occurs when the slope of 
FE(h E + (3m) — FE(h E ) with respect to m is 1 at m = (and, of course, Ke is at its equilibrium 
value). The value of j3 at which this happens, denoted /3 max , is given by 

0max = F'{F-\v E0 )Y 

(see Eqs. (22b) and (28)). For the sigmoidal gain function used in this example (Eq. (26)), /3 m ax 
is given by 

Pmax = ^ ( 1 " ^ V 1 • (29) 

The phase diagram for this model - a plot showing stability and, in the stable region, the firing 
rate of the foreground neurons - is shown in Fig. 3. 

2.5 Storage capacity 

In the above analysis there was no way to determine how many memories could be embedded in 
a network, and thus no way to determine storage capacity. That's because we hid all effects of 
the quenched noise - the noise associated with the random elements of the connectivity matrix 
- in F B and F l (see Eq. (18)). However, the quenched noise can have a nontrivial effect, in two 
ways. First, within the context of the self-consistent signal-to-noise analysis, it changes both F E 
and Fj, and thus modifies the nullclines. Second, and potentially more important, as we add 



22 




0.02 0.04 0.06 0.08 0.1 
a 

Figure 3: Phase diagram showing the values of a and (5 which exhibit both a stable background 
and memory retrieval. The firing rate (in Hz) of the foreground neurons is indicated by the 
color bar on the right. Below the colored region only the background exists, and it is stable. 
Above the colored region the background is unstable. The upper boundary is defined through 
Eq. (29); the lower boundary is determined numerically by finding the minimum value of (for 
a given a) such that the m-nullcline and /i B -nullcline intersect. The parameters are the same as 
in Fig. 2: J EE = J IE = 1, J EI = -1.9, J H = -1.5, h Eex = 3, h Iex = 2.1. 
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memories we increase the number of preferred modes that can be activated in the network, and 
thus we increase the quenched noise. Either effect could cause memories to be active when they 
should not be, and inactive when they should be. 

To quantify these effects, we note that both scale with the fluctuations associated with the 
memories that are not recalled on any particular trial. The size of these fluctuations can be 
found by computing the contribution of the memories to Sh,E, the fluctuating piece in Eq. (12a). 
Examining the memory portion of the connectivity matrix, Aij, which is given in Eq. (4), and 
noting that (3 is proportional to K^ 1 (Eq. (10)), we show in Appendix II that the variance of 
the quenched fluctuations associated with this term scale as p/Ke (Eq. (H-6)). Intuitively that 
is because when we sum the right hand side of Eq. (4) on j and [i there are (p— 1)Ke terms: 
Ke that come from the sum on j, and p—1 that come from the non-activated memories in the 
sum on fj,. Each of these terms has variance that is 0(1/ Kg). Central limit type arguments 
then tell us that the variance of such a sum is on the order of (p— \)Ke/K\. « p/Ke, where the 
approximation is valid if p is large. Consequently, there is a critical value of p/Ke above which 
none of the stored patterns could be retrieved. Thus, the maximum number of memories in a 
network should scale linearly with Ke- This is what we found in our simulations (Sec. 3). 

Unfortunately, the scale factor we found in our simulations was small, in that the maximum 
number of memories scaled as 0.01-K^. A natural question to ask, then, is: can the scale factor 
be improved by, for example, using different parameters in our network? In the rest of this 
section, we focus on the effect of the coding level, a, on the storage capacity. We choose the 
coding level because, at least in simple models, the storage capacity is inversely proportional to 
a [33,34,37]. We have already shown that as the coding level decreases the foreground firing 
rate becomes large, so we cannot make a arbitrarily small. However, the minimum allowable 
value of a depends on the model. What we show below, though, is that even for models which 
exhibit realistic foreground firing rate at relatively low coding levels, the 1/a scaling of the 
storage capacity does not hold. This suggests that decreasing the coding level cannot be used 
to increase the storage capacity in realistic networks. 

Examining Eqs. (12a) and (II-7), we see that the background neurons receive an input drawn 
from a Gaussian distribution with mean h E and standard deviation a E , while the foreground 
neurons receive input with larger mean, h E + (3m, and the same standard deviation, a E . When 
the standard deviation of these distributions, a E , is smaller than the separation between the 
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means, the two populations are well separated (Fig. 4a) and memory recall is possible. The 
standard deviation, however, is an increasing function of p; see Eq. (II-8d) and note that p enters 
this equation only through the storage load, a, which is defined to be 

as w E - < 30 > 

When a, and thus p, becomes large enough, the standard deviation is of the same order as the 
separation. At this point, the two distributions have a significant overlap with each other (Fig. 
4b) , and memory recall fails . 

Using this intuitive picture and Eq. (II-8d), we can find the value of a for which a E is on 
the order of the separation between the means; this should give us an estimate of the storage 
capacity, a max . Using Eq. (II-8d) and the fact that the means are separated by [3m (see Fig. 
4), we see that this happens when 



( J EE a max -. — (a 7 i + 72) + Jim ~ P m , (31) 

where 



J^E 

72 = ^{F 2 E (h E + a E z)) z 

J^-E 

l3 = Y i (Ff(h 1 + a 1 z)) z . 
Solving Eq. (31) for a max then leads to 



Otmax ~ (1 - a) 



- Jj /73 J 2 EE 
/3 2 (a7i+72) P 2 



If the background synaptic weights, J EE and J EI , were zero and there was zero background 
firing, so that 72 vanished, we would recover the 1/a scaling (in the small a limit) found in simpler 
models [33, 34, 37]. With nonzero background synaptic weights, however, the capacity no longer 
scales as 1/a. Consequently, we expect that the maximum capacity cannot be improved much 
by using sparser codes. 
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Figure 4: Distribution of inputs to foreground (mean=/ig) and background (mean=/iE + f3m) 
neurons, and its relation to storage capacity. Both inputs have a Gaussian distribution. The 
means are separated by (5m and the standard deviation of both distributions is a E . (a) The 
standard deviation is much smaller than the distance between the means of the two distributions. 
In this regime, the two populations are well separated, there is no interference between them, 
and memory retrieval is supported, (b) As a increases, a E also increases (Eq. (II-8d)) while 
m changes rather slowly (Eq. (II-8b)), so the distributions start to overlap. When the overlap 
becomes large, noise dominates the signal, and memory recall is no longer possible. The value 
of a at which this happens is the storage capacity, a max . 
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3 Computer simulations 

Our mean field analysis gave us two predictions. The first is that if the background synaptic 
weights, the J, scale as K^ 1 / 2 , the foreground weights, A, scale as if -1 , and the coding level, 
a, is sufficiently high, then both the background and foreground neurons should operate in 
the balanced regime and the neurons should fire irregularly The second is that the number of 
memories that can be stored is proportional to the number of excitatory connections per neuron, 
K E . 

To test these predictions, we perform simulations with large networks of spiking neurons. 
We start by finding, for a particular network size, parameters such that both foreground and 
background neurons exhibit irregular activity. We then increase the size of the network while 
scaling the synaptic weights according to the above prescriptions. If the larger networks continue 
to exhibit irregular activity, then our predicted scalings are correct. To test the relation between 
storage capacity and number of connections per neuron, we calculate the storage capacity for 
networks with different sizes. A linear relation would indicate a scaling consistent with our 
predictions. 

3.1 Network model 

Each neuron is modeled as a conductance-based quadratic integrate and fire (QIF) neuron. 
Dendritic trees and axonal arborizations are not considered. The spikes generated in any neuron 
immediately affect all the postsynaptic neurons connected to it. The membrane potential of 
neuron i of type Q, denoted Vqi evolves according to 



dV Ql (V Ql -V r )(V Ql -V t ) 

T — 7— = — — 1" VQi 



dt V t - V r 

- (VQi ~ £ E ) £ J^Sijit) ~ (V Qi - Sj) J?/**® + he 

'" Si i ± j , 

k 



(32a) 



dt 



' .s 



(32b) 



Here r is the membrane time constant, t s is the synaptic time constant, V r and Vt are the nominal 
resting and threshold voltages, Vqi determines the actual resting and threshold voltages, (Vbj is 
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constant for each i, but as a function of i it's a Gaussian random variable with mean Vq and 
standard deviation AVq), J® R is the connection strength from cell j in population R to cell i 
in population Q, 8e and £j are the excitatory and inhibitory reversal potentials, respectively, 
the notation j G R means sum over only those cells of type R, <5(-) is the Dirac 5-function, t k - is 
the k th spike emitted by neuron j, and h,Q e x,i is the external input to neuron i of type Q. The 
external input is modeled as 

hQex,i = (VQi — £E)JQexSQex(t) (33a) 
dSq cx S t 



dt 



where the t^ ex are the times of the external spikes. These are taken to be Poisson at constant 
rate v Q 

ex' 

There are two features of these equations that are worth commenting on. First, the con- 
nection strengths, jfj R , are completely analogous to the ones given in Eq. (2). Thus, although 
the J® R in Eq. (32) have different numerical values than those in Eq. (2), they should have 
the same scaling with connectivity [40,44, 57]. The same is also true of JiQ ex , except that here 
fiQex has temporal fluctuations whereas in Eq. (2) it does not. Second, we have included a term 
Voi, which has the effect of making the resting membrane potential and threshold of each cell 
different. This was not explicitly modeled in our mean field analysis, although it would not have 
made much difference - it would have only added to the quenched noise. 

The have the same form as in Eq. (6), except that we introduce an extra scaling factor 
so that connection strengths can be directly related to PSP size. Specifically, we use the fact 
that if neuron j spikes and neuron i is at rest, then the PSP generated at neuron i will have 
peak amplitude Vr where 

VR 



(r/r s )exp[ln(r/r s )/(r/r s -l)] ' 



see [58] for a derivation of this expression. This suggests that we should scale our connection 
strengths by Vr, so we write 



QR 
JQR _ c ij 

V ~ Vr 



Jqr + S q , e Sr,ePJ2^^-^ 



(34) 
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where cf- R is the same binary random variable defined in Eq. (7), Sq^r is the Kronecker delta, 
and Jqr and f3 in Eq. (34) correspond to, but typically have different numerical values than, 
the ones in Eqs. (6) and (10). If Vr is in mV, then Jqr is the peak postsynaptic potential, in 
mV, that occurs in a neuron in pool Q when a neuron in pool R fires (assuming the two are 
connected, the postsynaptic neuron is at rest, and (3 = 0). 

Our analytical results have been derived by assuming current based neurons. However, it is 
possible to extend such analysis to a more realistic network of conductance based neurons by 
noting that the effective connection strength in a conductance based model is proportional to 
the PSP size [40,44,57]. Thus, for the network to operate in the balanced regime, we should 
have the following scalings, 



Jqr ~ K x / 2 , 

p ~ K, 
(5 ~ K^ 1 . 



(35a) 
(35b) 
(35c) 
(35d) 



Note that the mean external excitatory input must be proportional to K 1 ! 2 . Therefore, given 
Eq. (33a) and the scaling of JQ ex in Eq. (35b), the firing rate of the neurons that provide external 
input, VQex, should scale as K. 

We performed simulations using three different networks, called Networks 1, 2 and 3, that 
differ in the number of neurons (they contain a total of 10,000, 20,000 and 30,000, respectively). 
In all three networks c = 0.15, so K is proportional to the total number of neurons in the 
network. Because of the scaling in Eq. (35), the values of Jqr, (3, VQ ex and p also differ. The 
parameters for the three networks are given in Table I. Our goal in these simulations is to 
determine whether, as predicted by our mean field analysis, the above scaling leads to behaviour 
that is independent of K and the firing of both foreground and background neurons is irregular. 

3.2 Building a balanced network 

Our first step in assessing our mean field predictions is to build a network that operates in the 
balanced regime and supports retrieval states. To test whether a network is operating in the 
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Table I. Parameters used in the simulations, [^predicted (the only parameter not actually used 
in the simulations) is the predicted value of (3 based on our mean field analysis. 



Network number 


1 


2 


3 


Excitatory neurons 


8,000 


16,000 


24,000 


Inhibitory neurons 


2,000 


4,000 


6,000 


K{= K E + Kr) 


1,500 


3,000 


4,500 


Jee 


0.5 mV 


0.35 mV 


0.29 mV 


J EI 


1.0 mV 


0.71 mV 


0.58 mV 


JlE, Jll 


-4.0 mV 


-2.83 mV 


-2.31 mV 


^Eex 


0.5 mV 


0.35 mV 


0.29 mV 


J I ex 


1.0 mV 


0.71 mV 


0.58 mV 


V Bex 


1000 


2000 


3000 




450 


900 


1350 


P 


0.168 


0.101 


0.077 


/^predicted 


- 


0.083 


0.056 


V 


5 


10 


15 


V 


1.5 mV 


1.5 mV 


1.5 mV 


AV 


0.5 mV 


0.5 mV 


0.5 mV 


a 


0.1 


0.1 


0.1 


c 


0.15 


0.15 


0.15 


T 


10 ms 


10 ms 


10 ms 


T s 


3 ms 


3 ms 


3 ms 


V r 


-65 mV 


-65 mV 


-65 mV 


v t 


-50 mV 


-50 mV 


-50 mV 


£e 


mV 


mV 


mV 


Si 


-80 mV 


-80 mV 


-80 mV 


Time step 


0.5 ms 


0.5 ms 


0.5 ms 
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balanced regime, we rely on two indicators. One is that it exhibits irregular firing, quantified 
by the coefficient of variation (CV) - the ratio of the standard deviation to the mean interspike 
interval - and that the CV is independent of K. The second is that the mean excitatory and 
inhibitory firing rates scale linearly with the external input, as predicted by Eq. (20). To test 
whether a network supports retrieval states, we simply activate a memory by bombarding all 
the neurons within a memory with excitatory input, and ask whether the memory stays active 
for several seconds. Very little fine tuning was required to find a network that exhibited both 
balance and retrieval states: we simply chose reasonable peak PSPs, set the coding level, a, to 
0.1, and increased (3 until at least one memory was stored. 

In Fig. 5a we show an example of the retrieval of a stored pattern for Network 1. The first 
2 seconds in this figure consists of background firing; at t = 2 seconds, neurons selective for one 
of the patterns receive an excitatory external input lasting for 100 ms; and at t = 27.3 seconds, 
the same neurons receive an inhibitory external input, which again lasts for 100 ms. The blue 
line is the mean firing rate of the foreground neurons, the black line is the mean firing rate of 
the excitatory neurons (both foreground and background) and the red line is the mean firing 
rates of the inhibitory neurons. 

Two points are worth mentioning. One is that the background firing rate in our simulations 
is lower than the background firing rate observed in studies of delay activity, which range from 
1.5 to 8 Hz [15], although we should point out that the firing rates determined from extracellular 
recordings may be overestimated due to selection bias [59]. We could, however, achieve a higher 
background rate by increasing the excitatory external input; an example is shown in Fig. 6, for 
which the network parameters are the same as Network 1 (Fig. 5a) except that the external 
input to excitatory and inhibitory neurons is five times higher, (3 is a factor of about two higher, 
and there is just one stored pattern instead of five. With the higher input, the background 
and foreground rates are in the range reported from neurons in, for example, anterior ventral 
temporal cortex [1,3] and entorhinal cortex [15]. 

The second point is that during retrieval the mean firing rates of the excitatory and inhibitory 
neurons differ from the background rates; i.e., from the rates when no memories are activated. 
This appears to be inconsistent with the balance condition, which predicts that the mean firing 
rate during the activation of a memory is the same as that when the network is in the background 
state (see Eq. (20)). However, this prediction holds only in the limit of infinite connectivity. For 
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finite connectivity, there are corrections, and they are particularly important when the firing 
rate is low [54]. For example, in Fig. 5a the average excitatory activity increased from 0.28 
Hz in the background to 1.07 Hz during retrieval (an increase of about 400%), whereas in Fig. 
6, where the background is higher, it increased from 1.06 Hz to 1.73 Hz (an increase of 60%). 
Thus, the increase in the mean excitatory firing rate during retrieval is reduced when the firing 
rate is higher. However, this is accompanied, at least in the parameter range we looked at, by 
a decrease in the storage capacity. Since we would like to study the scaling of storage capacity, 
we operate in the lower firing rate regime. A detailed search of parameter space is required to 
determine whether both high storage capacity and high background firing can be achieved. 

In Fig. 7a we show the CV versus firing rate, again for Network 1. Here and in what follows, 
the CV is calculated only for those neurons that emit at least 5 spikes during the 25 second 
period that the pattern is active. The data in this figure fall into two clusters, one (blue dots) 
corresponds to background neurons and the other (red crosses) to foreground neurons. The 
distributions of CVs and firing rates are shown in Figs. 7b and c. The CV of both background 
and foreground neurons are on the order of 0.8, which indicates irregular firing. This suggests 
that the network is operating in the balanced regime. To further test for balance, in Fig. 8a we 
plot the average excitatory and inhibitory firing rates versus the external input. As predicted 
by Eqs. (19) and (20), the relation is approximately linear. 

3.3 Scaling of the parameters 

To test our predicted scaling with the number of connections, we considered networks with two 
and three times the number of neurons and connections as in Network 1; these are Networks 2 
and 3. At the same time we scaled Jqr by K~ x l 2 , v Qex by K, and p by K (see Eqs. (35a-c)). 
The value of was set, as in Network 1, to the minimum value that results in retrieval of a 
single stored pattern. The 1/K scaling of (3 (Eq. (10)) gives us the values reported as ^predicted 
in Table I. The values found from simulations {f3 in Table I) do not exactly follow the expected 
1/K scaling: (3 is 20% too large in Network 2 and 40% too large in Network 3. As discussed 
in Appendix IV, this is because of finite K effects, and the trends we see here follow the trends 
predicted in that appendix. 

Examples of stored memories in Networks 2 and 3 are shown in Figs. 5b and c, the CV versus 
firing rate is shown in Figs. 7d and g, and the distribution of background and foreground CV 
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Figure 5: Examples of activation of a retrieval state, (a) Network 1. (b) Network 2. (c) Network 
3. Colors indicate mean population activity. Blue: foreground neurons. Black: excitatory 
neurons. Red: inhibitory neurons. At t = 2 seconds, neurons selective for one of the patterns 
receive a 100 ms barrage of excitatory input; at t = 27.3 seconds, the same neurons receive a 
barrage of inhibitory input. 
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Figure 6: Retrieval states with higher external input than in Fig. 5, and thus higher background 
firing rate. All parameters except v Em u Iex , (3 and p are the same as in Network 1: here v Eex = 5000 
Hz, u Iex = 2250 Hz, (3 = 0.325 and p = 1, versus Network I, where v Eex = 1000 Hz, v Iex = 450 Hz, 
(3 = 0.167 and p = 5 The stored pattern receives input for 100 ms, starting at t = 2 seconds, 
and then receives an external inhibitory current, again for 100 ms, starting at t = 6.2 seconds. 

and firing rates during the 25 second period that the memory is active are shown in Fig. 7e and 
f for Network 2 and Fig. 7h and i for Network 3. These plots show that when the connection 
strengths are scaled properly, both the background and foreground neurons exhibit irregular 
firing, just as in Network 1. Finally, Fig. 8b and c show the relationship between the external 
input and the firing rate of the inhibitory and excitatory populations. As we saw for Network 1, 
the firing rate of excitatory and inhibitory neurons are linearly related to external input, further 
evidence for the balanced regime. In theory the lines should lie on top of each other, however 
due to finite size effects this does not happen. The fact that finite size effects are responsible for 
this deviation from the theory can be seen by noting that the lines corresponding to Network 2 
and Network 3 are much closer to each other than Network 1 and Network 2. 

3.4 Scaling of the maximum number of memories 

Our last prediction is that the maximum number of memories should be linear in the number of 
excitatory connections, Ke- To test this, for each of our three networks we increased the number 
of patterns, p, until the network failed to exhibit retrieval states. Specifically, we performed 
simulations as describe in Fig. 5, except that the memory was active for 6 seconds rather than 
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Figure 7: The distribution of CVs (coefficients of variation) and firing rates for foreground 
and background neurons. The first, second and third rows correspond to Networks 1, 2 and 3, 
respectively. Column 1. CV versus the firing rate of background (blue dots) and foreground 
(red crosses) neurons. Consistent with activation of a memory state, the neurons fall into two 
clusters, one corresponding to the foreground and the other to the background. Column 2. 
Distribution of CVs for foreground (filled red bars) and background (solid line). The mean 
of both distributions is about 0.8, reflecting the fact that the neurons are firing irregularly. 
Column 3. Distribution of firing rates for foreground (filled red bars) and background (solid 
line) . 
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Figure 8: Average excitatory (blue) and inhibitory (red) firing rate versus external input to exci- 
tatory neurons, measured as firing rate per connection {v Eex / K B ). The ratio v Iex jv Eex was fixed 
at 0.45. Full lines, dashed lines and dotted lines correspond to Networks 1, 2 and 3, respectively. 
The average rates are calculated during a 4 second period which consists of background firing 
only. The linear relationship between the mean inhibitory and excitatory firing rates and the 
external input is a signature of the balanced regime. 
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25. For each value of p, we activated all p memories one at a time. If the mean activity of 
the foreground neurons during the 6 seconds of activation was at least 3 times larger than the 
activity averaged over all excitatory neurons, then that memory was said to be successfully 
retrieved. 

The results of these simulations are shown in Fig. 9a, where we plot the fraction of successful 
retrievals versus p/Ke for the three networks. Consistent with our predictions, the transition to 
a regime where none of the patterns could be retrieved occurs at approximately the same value 
of p/Ke for all three networks. Moreover, as one would expect, the transition for the largest 
network is sharper than for the others. 

Although Fig. 9a shows that p max scales linearly with Ke, in these simulations Ne also 
scales with Ke, so this does not rule out the possibility that p max is proportional to Ne rather 
than Ke- To test for this, in Fig. 9b we plot the fraction of successful retrievals versus p/Ke, 
but this time with Ke fixed and Ne varied. This figure shows that p max is proportional to Ke, 
not Ne, ruling out the Ne scaling. 

4 Discussion 

In this paper we addressed two questions. First, can all the neurons in an attractor network 
- both background and foreground - exhibit irregular firing? And second, what is the storage 
capacity in networks of realistic spiking neurons? To answer these questions, we applied self- 
consistent signal-to-noise analysis to large networks of excitatory and inhibitory neurons, and 
we performed simulations with spiking neurons to test the predictions of that analysis. 

Our primary finding is that two conditions must be met to guarantee irregular firing of both 
foreground and background neurons. The first is proper scaling with the number of connections 
per neuron, K: the strength of the background weight matrix must scale as K~ l l 2 and the 
strength of the structured part of the weight matrix (the part responsible for the memories) as 
K^ 1 . What this scaling does is guarantee "balance," meaning the network dynamically adjusts 
its firing rates so that the mean input to a neuron is on the same order as the fluctuations, 
independent of K. This in turn guarantees that the degree of irregular firing is independent of 
K. 

While balance is a necessary condition for irregular firing, it is not sufficient. That's because 
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Figure 9: Scaling of the maximum number of patterns with the number of excitatory connections 
per neuron, K E . (a) The fraction of successful runs versus the storage load, a = p/K E , for three 
different values of K E . The size of the network is scaled such that we always have K E /N E = 
Kj/Nj = 0.15. There is a critical value of a, above which the fraction of successful runs is zero; 
this is the storage capacity a max . The transition at a max is sharp for K E = 3600 but smoother 
for K E = 2400 and K E = 1200, due to finite size effects. The fact that a ma x is almost the same 
for all three values of K E implies that the maximum number of patterns that could be stored 
and retrieved, p max , is linear in K E . (b) The fraction of successful runs versus the storage load, 
a = p/K E , for three networks with all parameters, except for the total number of neurons in the 
network, equal to those of Network 1. This figure shows that increasing the size of the network 
does not change p max . 
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balance ensures only that the mean and fluctuations are independent of K, but does not rule 
out the possibility that the mean is much larger than the fluctuations, which would result in 
regular firing. To ensure that this does not happen, a second condition must be satisfied: the 
coding level, a, must be above some (iT-independent) threshold. This condition is needed to 
ensure that the coupling between background and foreground neurons is sufficiently strong to 
stabilize a low firing rate foreground state on the unstable branch of the m-nullcline (see Fig. 

1)- 

The analysis that led to predictions of irregular firing also quite naturally provided us with 
information about the capacity of attractor networks - the maximum number of patterns that 
could be stored and successfully retrieved. What we found, under very general conditions, 
was that this maximum, denoted p max , is linear in the number of excitatory connections per 
neuron, Ke- This scaling relation has been observed in studies of simplified attractor networks 
[16,32,34], but, as discussed in the Introduction, those models did not include all the features 
that are necessary for a realistic recurrent networks. Thus, the analysis performed here is the 
first to show that the number of memories is linear in Ke in biophysically plausible networks. 

4.1 Scaling in other models, and the importance of (9(1) input to the fore- 
ground neurons 

Note that there are other types of scaling, different from what we proposed, which can result 
in irregular firing of both foreground and background neurons. What is critical is that the net 
input a foreground neuron receives from the other foreground neurons should be 0(1). We 
achieved this by letting the structured part of the connection matrix (the second term in Eq. 
(11a)) be 0(l/-fsT) and using a coding level, a, that was 0(1). However, this is not the only 
possible combination of connection strengths and coding levels, and in the two other studies 
that address both scaling and irregularity in memory networks [27,28], different combinations 
were used. In the model proposed by van Vreesjwik and Sompolinsky [27], the structured part 
of their connection matrix was a factor of K x l 2 larger than ours; to balance that, the coding 
level was a factor of K 1 / 2 smaller. In the model proposed by Renart et al. [28], the structured 
part of the synaptic weights was K times larger than ours, so their coding level had to scale as 
0(1/K). Whether such low coding levels are consistent with reality needs further investigation; 
however, data from studies conducted on selectivity of neurons to visual stimuli suggests that 
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it is too low [30]. In addition to the very low coding level that these two models require, they 
also exhibit non-biologically high foreground firing rate. Nevertheless, the model of Renart et 
al. [28] does have one advantage over others: the foreground neurons are as irregular as, or even 
more irregular than, the background neurons, something our model does not achieve (see next 
section) . 

4.2 Not as irregular as it could be 

Although our simulations showed irregular activity, we found that the mean CV was only about 
0.8. This is smaller than the values measured in vivo, which are normally close to, or slightly 
above, one [24,50-53]. In addition, in our simulations the CV showed a small, but consistent, 
decrease with firing rate (see the first column in Fig. 7). This is due to the fact that with 
the scaling that we chose, the fluctuations in the input current to foreground and background 
neurons is the same but the mean currents to the foreground neurons is higher (Appendix I). 
This decrease in the CV disagrees slightly with a study by Compte et al. [24], who found that 
the CV in prefrontal cortex does not depend on the mean firing rate, at least in a spatial memory 
task. While there are many possible reasons for this discrepancy, a likely one arises from the 
fact that the neurons in our network contained only two time scales, the membrane and synaptic 
time constants, and both were short: 10 ms for the former and 3 ms for the latter. Real neurons, 
however, have a host of long time scales that could contribute to irregularity [60]. In addition, 
in vivo optical imaging [61-63] and multi-electrode [64] studies indicate that the background 
activity varies coherently and over long time scales, on the order of seconds, something we did 
not model. Both of these would increase the CV, although how much remains to be seen. 

Although multiple time scales could certainly increase irregularity, it is not the only possible 
way to do this. As discussed in the Introduction and in the previous section, the model proposed 
by Renart et al [28] also increases irregularity, and is consistent with the experimental results 
of Compte et al [24]. However, it requires a very small coding level (a ~ 1/K), and fine tuning 
of the parameters. 
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4.3 Subthreshold versus suprathreshold persistent activity 

In conventional models of persistent activity [14, 22, 29], the foreground activity necessarily lies 
on the concave part of the excitatory gain function, F E (h E ), whereas the background activity 
lies on the convex part. Since the inflection point of realistic gain functions is typically near 
the firing threshold [42,43], this type of bistability is called suprathreshold bistability [22,28]. 
Because the concave part of the gain function is typically at high firing rate, with suprathreshold 
bistability it is hard to have either low foreground firing rate or high CV. Consequently, there 
has been interest in understanding whether it is possible to have subthreshold bistability; that 
is, whether it is possible for both foreground and background solutions to lie on the subthreshold 
part of the gain function [28]. 

The model presented here can in fact show subthreshold bistability: as discussed in Sec. 2.3, 
increasing the coding level, a, brings the foreground firing rate very close to the background 
rate. Therefore, for sufficiently large a, the foreground state would be on the convex part of the 
transfer function. Our model, and the recently proposed model by Renart et al [28], are the 
only ones that can show subthreshold bistability. 

4.4 Bimodal distribution of firing rates 

One rather striking feature of our networks is that they all produce a highly bimodal distribution 
of firing rates: as can be seen in the first and third columns of Fig. 7, the background neurons 
fire at a much lower rate than the foreground neurons - so much lower, in fact, that they form 
a distinct, and easily recognizable, population. This occurs because the patterns we store - 
the - are binary, which makes the average input current to every neuron in the foreground 
exactly the same. This feature is potentially problematic, as the distinction between foreground 
and background rates observed in experiments is not nearly as striking as the one in Fig. 7 [65] . 
However, this feature is not essential to our analysis, for two reasons. First, as discussed in Sec. 
3.2 (see especially Fig. 6b), we deliberately made the background firing rate low to increase the 
capacity. Second, it is also easy to extend our analysis to real valued patterns in which the 
elements of the £f are drawn from a continuous distribution [34]. Under this, more realistic, 
scenario, it it should be possible to match the statistics of the response seen in the cortex. This 
will be the subject of future work. 
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4.5 Fine tuning of the weights 

In our model, every time a new patterns is learned, the weights change by an amount proportional 
to K~ l . This is a factor of K l l 2 smaller than the background weights. Since weight changes 
are unlikely to be under such fine control, it is natural to ask whether errors during learning will 
lead to a major reduction in storage capacity The answer, of course, depends on the size of the 
errors. In Appendix V, we show that errors can be larger than the weight changes by a factor of 
(K/p) 1 / 2 , with only a small change in storage capacity. More specifically, every time a pattern 
is learned, noise of O ((-fTp) -1 / 2 ) can be added to the synaptic strength, and the network will 
retain its ability to store and recall patterns. 

Although this result tells us that the noise in the weight changes can be large compared 
to the structured part, the fine tuning problem is not entirely eliminated: the noise must still 
be a factor of p 1 / 2 smaller than the background weights. Because of the low storage capacity 
found in these networks (at most 2.5% [23]), even when K is as large as 10,000, 1/p 1 / 2 is on 
the order of 6%. It seems plausible that biological machinery has evolved to achieve this kind of 
precision. However, for networks with larger capacity, the requirements on the precision of the 
weight would be more stringent. 

It is also possible to have a probabilistic learning rule for which the changes in the weights 
are on the same order as the background weight, but this decreases the capacity significantly 
by a factor of \[K (see Appendix V, Eq. (V-3); we thank an anonymous reviewer for pointing 
this out). Although this probabilistic learning rule guarantees a balanced state with irregular 
background and foreground firing, it has the drawback that the storage capacity scales as \[K 
rather than K. 

4.6 Low storage capacity 

Although we showed that p max oc Ke, we did not compute analytically the constant of propor- 
tionality. In our simulations, this constant was small: from Fig. 9, p max is about O.OIKe, which 
means that for Ke = 10,000 we can store only about 100 patterns. It is important, though, 
to note that we made no attempt to optimize our network with respect to other parameters, 
so the constant of proportionality 0.01 is unlikely to be a fundamental limit. In fact, Latham 
and Nirenberg [23] were able to store about 50 patterns in a network with 2000 excitatory 
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connections, 2.5 times larger than our capacity. Interestingly, the only substantial difference 
between their network and ours was that in theirs the background activity was generated by 
endogenously active neurons rather than external input. 

Can we further increase the scaling factor? One potential mechanism is to decrease the 
coding level, a, since, at least in simple models [33,34,37], the maximum number of patterns 
that could be stored and retrieved is inversely proportional to the coding level. But, as we 
showed in Sec. 2.5, realistic networks do not exhibit this 1/a scaling. Consequently, sparse 
coding cannot be used as a way to improve the storage capacity in our network. Simplified 
models also suggest that one can increase the storage capacity by a factor of 3-4 by using other 
schemes, such as non-binary patterns [34], or spatially correlated patterns [66]. Whether these 
techniques can be extended to the kind of network we have studied here is not clear, and requires 
further investigation. However, an increase beyond a factor of 3-4, to a capacity above around 
0.1, seems unlikely within this class of networks. 

In any case, there is a limit to the number of memories that can be stored in a single attractor 
network with a fixed number of connections per neuron, no matter how many neurons in the 
network. This suggests that, in order to make the best use of the existing connections, realistic 
working memory systems must be composed of interconnected modules. In this paradigm, each 
module would consist of an attractor network [67-69]. Such modular structure naively suggests 
a combinatorial increase in storage capacity; however, understanding how to achieve such an 
increase has proved difficult. For simple models whose storage capacity could be calculated 
analytically, either no increase in the storage capacity [67] or a modest increase [69] was found. 
It is yet to be determined how modular networks could be implemented in realistic networks of 
spiking neurons, and what their storage capacity would be. 

Appendix I: Fast fluctuations 

The starting point for essentially all of our analysis is Eq. (1), which, when combined with Eq. 
(2), tells us that the time evolution of the firing rate of each neuron is a function purely of the 
firing rates of the other neurons. At a microscopic level, though, each neuron sees as input a set 
of spikes, not rates. However, for our model, rate-based equations do apply, as we show now. 
In a spiking, current-based network, the input, /iQj(t), to the i th neuron in population Q has 
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the form 

R j&R 

where tj is the time of the k th spike on the j th neuron, f R (t), which mimics the PSP, is a non- 
negative function that integrates to 1 and vanishes for t < and t large (greater than a few 10s 
of ms). In a slight departure from our usual convention, R can refer to external input (i? = ex) 
as well as excitatory and inhibitory input (R = E,I). 

Our first step is to divide the input, h,Qi, into a mean and a temporally fluctuating piece. 
The mean, which is found by time averaging the right hand side of Eq. (I-la) and using the fact 
that f R (t) integrates to 1, is simply 

<m*)>* = E E J?j R ( s m)t = E E ?- 2 ) 

where (■■■)* represents a temporal average. The temporally fluctuating piece of the input can 
then be written 

sh Qi (t) = h Qi (t) - (h Ql (t)) t = e E jT 6S !^)^ ( J - 3a ) 

R j&R 

8Sf{t) = Sf{t) - u Rj . (I-3b) 

The fluctuations, 5h Q i, have zero mean by construction, and their correlation function, Cq^t), 
is defined to be 

C Qt (r) = (Sh Qi (t + T)5h Qi {t)) t . (1-4) 

Assuming that /iq, is Gaussian (which is reasonable if there are a large number of neurons 
and they are not too correlated), then the firing rate depends only in the mean, {h,Qi(t))t, and 
the correlation function, Cqi{t). If the correlation function is independent of i, then the only 
i-dependence in the firing rate is through the mean input, and we recover Eq. (1). What we 
now show is that, for our model, Cqi does not depend on i. 

To understand the behaviour of Cq{, we express it in terms of SSj(t); using Eq. (I-3a), we 
have 

C Q i(r)=Yl E J§ R jS?(6S?{t)6S${t + T)) t 
R,R' jeRj'eR' 
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Under the assumption that the neurons are very weakly correlated, only the terms with j = j' 
survive, and this expression simplifies to 

CQi(r) = £ £( J§ R ) 2 (6S?{t)6S?{t + r)) t . 
R jeR 

Let us focus on the sum on j on the right hand side of this expression. For Q ^ E or R ^ E, 
this sum is given by (see Eqs. (6b-d)) 

jeR jeR 
For sparsely connected networks, cf- R is independent of 8SJ(t). Consequently, we can replace 
cfj R on the right hand side of Eq. (1-5) by its average, c, and the right hand side becomes 
independent of i. 

For Q = R = E the situation is more complicated, as J EE has an additional dependence on 
Aij, the structured part of the connectivity. Specifically using Eq. (6a) and again replacing c EE 
by its average, c, we have 

J EE ) 2 (8S°(t)5S°(t + r)) t = c^Jee + Ay)*(6Sf{t)6Sf{t + r)) t . (F6) 
jeE jeE 

As discussed in Appendix II, receives contributions from two sources: the p — 1 patterns that 

are not activated, and the one pattern that is. The non-activated patterns are not correlated 

with 5Sj, so they can be averaged separately in Eq. (1-6), and thus do not produce any i- 

dependence. The activated pattern, on the other hand is correlated with 5Sj. However, the 

connection strength for the one activated pattern is smaller than Jee by a factor of K~ l l 2 (see 

Sec. 1). Consequently, in the high connectivity limit, we can ignore this contribution, and the 

right hand side of Eq. (1-6) is independent of i. This in turn implies that Cqi depends only on 

Q. 

The upshot of this analysis is that the only i-dependence in the firing rate comes from 
(hQi(t))t- Moreover, comparing Eqs. (2) and (1-2), we see that {hQiit))t is exactly equal to hQi, 
the input current to the firing rate function, Fq, that appears in Eq. (1). Thus, for the model 
used here, the rate-based formulation is indeed correct. What we do not do is compute Fq, as 
that would require that we compute the correlation function, Cq(t), self-consistently, which is 
nontrivial [44] . However, our results depend very weakly on the precise form of Fq , so it is not 
necessary to have an explicit expression for it. 



Appendix II: Mean-field equations 
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In this appendix we derive the mean field equations for the model described in Sec. 1. As 
discussed in the main text, the derivation of these equations revolves around finding the distri- 
butions of 5fiEi and Shu, the fluctuations around the mean excitatory and inhibitory synaptic 
input (both quantities are defined implicitly in Eqs. (11-13)). The main assumption we make is 
that 5h Ei and 5h K are zero mean Gaussian random variables, so all we need to do is find their 
variances self-consistently. In addition, primarily for simplicity (and because it is reasonable in 
large networks in the brain), we assume that the number of connections is small compared to 
the number of neurons, so c< 1. 

Our first step is to simplify the expressions for our main order parameters, ve, m, and vj. 
In the context of the self-consistent signal-to-noise analysis, "simplify" means "replace sums by 
Gaussian integrals". To see how to do this, note that, for any function g, 



where Var[-] indicates variance, exact equality holds in the Ne — > oo limit (but approximate 
equality typically holds when Ne is only a few hundred), and 




Dz = 



dze-*' 2 
(2vr) 1 /2 



A similar expression applies, of course, to Shu. 



Applying the sum-goes-to-integral rule to Eq. (16), we have 




(II- la) 



m = 




a(l — a) 




(Il-lb) 




(II-lc) 



where the average over £ is with respect to the probability distribution given in Eq. (5). 

To complete Eq. (II- 1), we need the variance of SliEi and Shu. It is convenient to break 
the former into two pieces, SKei = SHei + Sh m i, where the first, ShEi, is associated with the 
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background neurons, and the second, Sh m i, is associated with the foreground neurons (both will 
be defined shortly). Then, examining Eqs. (11-15), and performing a small amount of algebra, 
we find that 



5h Ei = -^Jee £( c f/ ~~ c ) VE i + ~1<^ Jei y^i c ii ~ c ) u 'i 



Shu 



K E 
K E 



#1/2 



(II-2a) 
(II-2b) 



and 



Shm.i — 







K E a(l — a) 



(11-3) 



3 M 



Here 8^ v is the Kronecker delta; it is 1 if \i = v and zero otherwise. In addition, for notational 
convenience, we have returned the superscript "1" to £j. For the rest of the appendix, we will 
use £l and & interchangeably. 

Let us focus first on the contribution from the background, Eq. (II- 2). Since cf? is equal to 
c on average, the mean of both terms on the right hand side of Eq. (II-2) is zero. Moreover, 
these terms are uncorrelated, so their variances add. The variance of the QR th term is then 



Var 



K 1 / 2 
Kr 



QR 



c)v 



R3 



^E(( c r- c )( c ^- c ))w 

R 33' 



QR 



where the angle brackets represent an average over the distribution of cf, . Because c,-,- and c ., 
are independent when j ^ f , only terms with j ^ j' produce a nonzero average. Thus, all we 
need is the variance of c^- R — c, which is given by 



Var 



c QR - c 



= c(l -c) « c 



(the last approximation is valid because, as mentioned above, we are assuming c <C 1). Per- 
forming the sums over j and j' and collecting terms, we have 
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K 1 / 2 
Kr 



C)U 



p<j 



K 1 

KrNr 



K 



(II-4) 
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The term on the right hand side, {v^}, is the second moment of the firing rate of the neurons in 
pool R. Inserting Eq. (II-4) into (H-2), we find that 

Vax[ShQ] ^ 4 = E ^T R J QR^l) ■ ( n - 5 ) 

R 

The last quantity we need is the variance of 5h m . A naive approach to computing it proceeds 
along lines similar to those described above: assume all the terms in the sum over j and fi in 
Eq. (II-3) are independent, so that the variance of Sh m is just pNs (the number of terms in the 
sum) times the variance of each term. This yields, with rather loose notation for averages and 
ignoring the 0(Kg l ) correction associated with p = 1, 

All the averages in this expression are straightforward: ((cf^ 5 ) 2 ) = c, (£ 2 ) = a, ((£ — a) 2 ) = 
a(l — a), and {u 2 ) was defined in Eq. (II-4). Putting all this together and defining p 2 to be the 
variance of Sh m , we have 

Var[^ m ] ^ p 2 = . (II-6) 

l — a Kg 

While Eq. (II-6) turns out to be correct, our derivation left out a potentially important 
effect: correlations between the patterns, £f , and the overlaps, (the latter is defined in Eq. 
(11-11) below). These correlations, which arise from the recurrent feedback, turn out to scale as 
c, and so can be neglected [32, 70, 71]. Rather than show this here, we delay it until the end of 
the appendix (see subsection " Loop corrections vanish in the small c limif). 

To write our mean field equations in a compact form, it is convenient to define the total 
excitatory variance, 

o-l^al + p 2 . {11-7) 
Then, combining Eqs. (3), (II- 1), (H-5) and (H-6), the mean field equations become 
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vb = [(F E (h E + ifim + &ez))Q (II-8a) 

m = (((a(l - a))" 1 ^ - a)^/^ + £/?m + (II-8b) 

u I =(F I {h I + o I z)) z (II-8c) 

^ = ^ + r~^) < (jF|(/ls + ^ m + * EZ)) & + < F/2(/l/ + aiz) ^ (II_8d) 

^1 = §- e J 2 ie ((F 2 E (hE + £/?m + a E z))z) z + ^J 2 n (Ffihj + a I z)) z (II-8e) 
where the subscript 2 indicates a Gaussian average, 

<(■)>. = / ^(0, 

and, recall, a = p/k E (Eq. (30)). 

Finally, it is convenient to explicitly perform the averages over £ that appear in Eq. (H-8). 
Defining 

FP(h,a) = J DzF*(h,az), (11-9) 
the relevant averages become 



(F|(^ + e/?m + «T £ z))A =(l-a)F { E k \h E ,a E )+aF ( E k \h E + pm,a E ) (II-10a) 

/ 2 

<((a(l - a)) _1 (£ " a)F E (h E + £9m + ^«)){), = F E (h E + /3m, <7 £ ) - &e) ■ 

(II-10b) 



The functions Fe and that we used in Eq. (18) are equivalent to the ones defined in Eq. 
(II-9), although in the main text we suppressed the dependence on the standard deviation and 
dropped the superscript. 

Equation (II-8) constitutes our full set of mean field equations. A key component of these 
equations is that the number of memories, p, enters only through the variable a, which is p/K E . 
Thus, the number of memories that can be embedded in a network of this type is linear in the 
number of connections. 
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Loop corrections vanish in the small c limit 

To correctly treat the loop corrections in our derivation of the variance of 6h m , we need to be 
explicit about the correlations induced by the patterns, £f . We start by defining the i-dependent 
overlap, , as 



< = Ke 1 _ a) - - *)>"* • ( n - n ) 

3 

Inserting this into Eq. (II-3) leads to 

Each of the terms is a Gaussian random variable whose variance must be determined self- 
consistently. This can be done by inserting Eq. (3) into Eq. (11-11) to derive a set of nonlinear 
equations for the . There are two types of terms to consider: the activated memory, for which 
H = 1, and the non-activated memories, for which ju ^ 1. However, in the large p limit we can 
safely ignore the one term corresponding to fi = 1. Thus, considering the contributions from 
memories with fx ^ 1, we have 



\^d\ £ ^ - a)FE \ hE + ^ Pml + ShE i + P ^ m j + 1 3 E J 



K E a{ 

Taylor expanding around = and defining 

F%. = F E lh E + t){lm l + Sh Ej + 

F E j = F' E lh E + ^Pm 1 + Sh Ei + (3 Zj mU 3 
where a prime denotes a derivative, we have 



We can write Eq. (11-14) in matrix form as 

(I - A^)m^ = H M , (11-15) 
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where I is the identity matrix, the i th component of m fI is equal to m^ 4 , and the matrices A M 
and S M are given by 



13 K E a{l 



^EE/cfJ, 



v K E a(l-a) 13 [ ^ ] 



(II-16a) 
(II-16b) 



To solve Eq. (11-15) we need to invert I — A, in general a hard problem. However, what we 

— 1/2 

show now is that A has only one 0(1) eigenvalue, with the rest 0{K E ). This allows us to 
write the inverse in terms of the a single eigenvector and adjoint eigenvector, a simplification 
that allows us to perform the inversion explicitly. 

The spectrum of the random matrix, A' 4 , is determined primarily by the mean and variance 
of its components [72]. In the large Ne limit, these are given by 



A - 1 ),r4E WhE+l3ll " l+iEZ)) i-- 

where (■ ■ ■ indicates an average over i and j, and we used the fact that ^ and F^,- are 
independent. 

Given that A M is an Ne x Ne matrix, the fact that the mean and variance of its elements 
are 0(N^ 1 ) and O^KeNe)' 1 ), respectively, implies that it has one eigenvalue that is 0(1) and 
Ne — 1 eigenvalues that are OiK^^) [72]. Letting and vj, be the eigenvector and adjoint 
eigenvector of A M whose eigenvalue is we can solve Eq. (11-15) for m M , 

y V fc vj • 5* 

where "•" represents dot product. Letting k = correspond to the 0(1) eigenvalue and explicitly 
separating out this component, the expression for m fI becomes 
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m' 



and 



^ 1-A fe 1-Ao 



(11-18) 



fe 

= H M + 



Ao 



1 - Ao 



v vj • H* 1 



A = 0{F' E (h E + + a E z))t, z . 



(11-19) 



Since vo and vj are vectors whose components are all the same, without loss of generality we 
can choose vo = (1,1, /Ne and vj = (1,1,..., 1). Combining this choice with Eq. (11-18) 
and using Eq. (II- 16b) for H M , we have 



EE , cAp 

ij + 1 - A, 



KEd{l — a) 



(11-20) 



We are now in a position to return to Eq. (11-12) and compute the variance of Sh m (which, 
recall, is denoted p 2 ). Treating, as usual, all the terms in Eq. (11-12) as independent, we have 



P 2 = P 2 E^^KO^ = Pfa(mf)^ z 



(11-21) 



To compute (mf 2 )^^ we use Eq. (11-20) and the fact that the off-diagonal elements average to 
zero, and we find that 



N E 



2c 2 X c 2 \l 



(1-A ) (1-A ) 2 J 



{Fl(h E + I3jm + a E z))^ 
K 2 a{l - a) 



(11-22) 



To derive this expression we again used we used ((£ — a) 2 ) = a(l — a). 

Our final step is to insert Eq. (11-22) into (11-21). Ignoring the two terms in brackets in Eq. 
(11-22) that are a factor of c smaller than the first, and using the fact that (F 2 (h E + f3^m + 
a E z))^ z = (v 2 ), this leads to the expression for p 2 given in Eq. (II-6). Consequently, loop 
corrections vanish, and we can use our naive estimate for the variance of 5h m . 
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Ignoring the two terms in brackets in Eq. (11-22) is strictly correct for infinitely diluted 
networks; i.e., networks with c — > 0. When c is nonzero but small, the terms in the brackets can 
be ignored safely unless Ao — ► 1. However, as we now show, Ao = 1 is precisely the point where 
the background becomes unstable. Thus, it is not a regime in which we can operate. 

The significance of the limit Ao = 1 can be seen by replacing Eq. (11-15) by it's dynamical 
counterpart (see Eq. (1)), 

t e ^j- = (A" — I)m M + . (11-23) 

When the largest eigenvalue of A' 1 exceeds 1, the unactivated memories become unstable, and 
retrieval of just one memory is impossible. As discussed above, the largest eigenvalue of is 
Ao- Consequently, loop corrections are necessarily important (no matter how dilute the network 
is) at precisely the point where the unactivated memories, and thus the background, become 
unstable. 



Appendix III: Stability analysis 

To determine stability, we need to write down time-evolution equations for the order parameters, 
and then linearize those around their fixed points. For u E , v I and m, which are linear combina- 
tions of the firing rates, this is straightforward - we simply insert their definitions, Eq. (16), into 
the time-evolution equations for the individual firing rates, Eq. (1). For the variances, a\ and 
erf, the situation is much more difficult, as these quantities do not admit simple time-evolution 
equations [73]. Fortunately, we expect the effects of the variances to be small - as discussed 
in the main text, their primary effect is to smooth slightly the gain functions, something that 
typically (although presumably not always) stabilizes the dynamics. Alternatively, if we assume 
that the variances are functions of u E , i/ r and m (meaning we give them instantaneous dynamics), 
we can rigorously neglect them. This is because derivatives of the gain functions with respect 
to u E and v 1 are large, on the order of K 1 / 2 , while derivatives with respect to the variances 
are 0(1). Thus, as a first approximation, we will ignore these variables, and consider only the 
dynamics of v E , v 1 and to. Because of this approximation, we expect our stability boundaries to 
be off by a small amount. 

Combining Eqs. (1) and (II-8), the time- evolution equations for u E , Vi and to may be written 
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Te ~d = ^ Fe( - He + ^ m + dEz ))t) z - v * ( m - ia ) 

Ti ^ = {Fl ^ hl + aiz))z ~ Vi (m_lb) 

To simplify notation, it is convenient to define 

<f> E (v B , vi, m) = ((F E (h E + H3m + a E z))t.) z (III-2a) 

fafo, Vl , m) = (F^tn + aiz)) g (III-2b) 

m (i/ s , Vl , m) = <((a(l - a))- 1 ^ - a)F £ (/i£ + £/?m + <trz)> € } z . (III-2c) 



Then, linearizing Eq. (III-l) by letting u E — > i/ B + <W, ^/ — > f/ + <5^/, and m — > m + 5m, we have 



dt 



I 6» \ 

y 5m J 



I T E H<P 



E,E 



T F, <PEJ 



Tr (PE,m 



-1 



T E X 4>m,I T E \(j) mtm - 1) y 



-1/ 



where the notation (j) a ^ indicates a derivative of (f> a with respect to the argument specified by 
b (for example, 4>e,i = d(\> E jdv 1 and </>/ i?n = d<f>i/dm). Since 0/ is independent of m (which 
means 4>i,m = 0), the equation for the eigenvalues, denoted A, becomes 

= [((<f>E,E - 1) - T E X)(((j)lj - 1) - T/A) - <f> E ,I<f)I, E ](((t>m,m ~ 1) ~ T E \) 

+ [<f>I,E 4>m,I ~ {(4>I,I - 1) - Tl\)(j) m ,E]4>E,m • ( ln ~3) 

Equation (III-3) is a cubic equation in A, and thus not straightforward to solve. However, in 
the large K limit it simplifies considerably. That's because derivatives with respect to v E and 
Vj are 0(K l l 2 ), which follows because the </>'s depend on v E and Vj through h E and h,, and the 
latter are proportional to K 1 ! 2 (see Eq. (13)). Defining the 0(1) quantities 
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a = v E ,v u m and Q = v E ,Vi, Eq. (IH-3) becomes (ignoring 0{K~ x l 2 ) corrections) 

= [(^-^r^j^X^ 

+ fc4/-«/-r 1/2 T/A)4 £ ]^, m . (III-4) 

Examining Eq. (III-4), it follows that if the eigenvalue, A, is 0{K x l' 2 \ then the term 4> m ,m — ^- 
and the last term in brackets can be neglected. There are two such eigenvalues, and they are 
given by 

2 

(III-5a) 
(III-5b) 



Both eigenvalues are negative if 

T E 1( i>E,E + TfV/,/ < 

> 0. 



Since < 0, the first condition is satisfied if 77 is sufficiently small. For the second condition, 
from Eqs. (13), (21) and (III-2) we see that 

4>E,e4>I,I - <pE,l4>I,E OC D 

where the constant of proportionality is positive. Since the condition for the stability of the 
background is D > [54], we see that Eq. (III-5b) is satisfied whenever the background is 
stable. Thus, for 77 sufficiently small and the background stable, the two 0(K 1 / 2 ) eigenvalues 
are negative. 

The third eigenvalue is 0(1), so when computing it we can drop all the K~ x l 2 \ terms. 
Denoting this eigenvalue X m , we thus have 



x , 1 , (</>I,E<f>m,I ~ 4>I,I<t>m,E) 4>E,m /jjj c \ 

Am = 4>m,m ~ H 7 7 7 7 ■ (III-6) 

<PE,E<PI,I ~ <PE,I<PI,E 

Using a prime to denote a derivative with respect to He and noting that (see Eqs. (III-2)) 

dcj) Q dh Q 

_ d(j) m dh E 

^ ' dh E dVR ' 
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Eq. III-6 reduces to 

\ -A _ 1 _ ^'rn'pE.m 

where prime denotes a derivative. 

Comparing Eqs. (III-2) and Eq. (11-10), we see that (f>E,m = a4>m,,m, which leads to 

A m = m , m (l-^) -1. (IH-7) 

This expression strongly emphasizes the role of the coding level, a: if it were zero, the only stable 
equilibria would be those with (f) m ,rn < 1, which would imply high firing rates for foreground 
neurons (see Fig. lb). 

Although Eq. (III-7) tells us the stability of an equilibrium, it is not in an especially conve- 
nient form, as it does not allow us to look at a set of nullclines and determine instantly which 
equilibria are stable and which are not. However, it turns out that it is rather easy to determine 
the sign of A m for a given set of nullclines simply by looking at them. To see how, we make use 
of the expressions for 4>e and (f) m (Eqs. (III-2a) and (III-2c)) to reduce the right hand side of 
Eq. (III-7) to an expression with a single derivative. Our starting point is the definition 

tf(m) = F E {h E (m) + 15m) - F E (h E (m)), (III-8) 

where hE(m) is given by Eq. (28); the solutions of the equation \l/(m) = m correspond to network 
equilibria. The advantage of this one dimensional formulation is that, as we show below, the 
condition X m < is equivalent to d^f /dm < 1. Thus, by plotting the function ^(m) versus m 
and looking at its intersections with the 45° line, we can find the equilibrium values of m, and, 
more importantly, we can easily determine which of them is stable and which is unstable. 

To show that d$>(m)/dm < 1 is equivalent to the condition A m < 0, we note first of all that 



<Pm,m = (3F' E (h E + 13m) 

<p' m = F E (h E + [3m)-F E (h E ) 

<t>' E = aF' E (h E + f3m) + (1 - a)F' E (h E ), 

where, recall, a prime denotes a derivative. By combining these expressions with Eq. (III-7), 
and performing a small amount of algebra, the condition X m < can be written 

(3F\h E + f3m)F'{h E ) < aF'(h E + /3m) + (1 - a)F'(h E ). (111-9) 
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To see how this compares to d^/dm, we use Eq. (III-8) to write 



dm 



(3F' E (h E + f3m) + [F' E {h E + (3m) - F' E {h E )) 



dh E 
dm 



Then, usin; 



g 



Eq. (27), which tells us that 



dh E 
dm 



F' E {h E ) ' 



—a 



this expression becomes 



d^ 

dm 



1 + 



pF' E (h E + (3m)F' E {h E ) - [aF' E {h E + (3m) + (1 - a)F' E (h E )] 

aF' E {h E ) 



(111-10) 



Comparing Eqs. (III-9) and (111-10), we see that the condition d^B /dm < 1 is equivalent to 
A m < 0. Thus, it is only when *(m) intersects the 45° line from above that the equilibrium is 
stable. Since *(m) is bounded, if there are three equilibria, the smallest one must be stable, the 
middle one unstable and the largest one again stable. Thus, we can look at the nullcline plots 
and immediately determine stability (see below and Fig. III-l). 

As an example we revisit Fig. 2. In terms of our specific form for the gain functions, Eq. 
(25), and with h E (m) given by Eq. (28), the equation for m becomes 



This equation is solved graphically in Fig. Ill-la where we plot *(m) versus m for the same 
values of (3 used in Fig. 2 and with a = 0.005. Intersections with the 45° line correspond to 
solutions of Eq. (III-ll), and thus to network equilibria. 

As we saw in sections 2.3 and 2.4, the main factor that determines the number and location 
of the intersections, and thus the ability of the network to exhibit retrieval states, is (3. For 
(3 = 0.1 and 0.25, there is just one intersection at m = 0, while for intermediate values of (3, 
[3 = 0.5 and 1.2, two additional intersections appear. Increasing (3 even further moves one of 
the solutions to negative m and destabilizes the background, but this is not shown. We can now 
easily see that the curves in Fig. Ill-la with (3 = 0.1 and 0.25 have a single stable intersection 




(III-ll) 
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Figure III- 1 : (a) ^>{m) versus m for a = 0.001 and f3 = 0.1,0.25,0.5 and 1.2. The equilibrium 
values of m are the intersections of the curves ^{m) with the diagonal line. For small /3, 
m = is the only solution. For intermediate (5 there are two additional non-zero solutions, 
(b) ^(m) versus m for a = 0.05 and /3 = 1.2; the upper intersection is now at a biologically 
realistic firing rate. Note different scales compared to (a). Other parameters, as in Fig. 2, are 
Jee = J ie = 1) J ei = -1.9, J u = —1.5, h Eex = 3, h Iex = 2.1. 

at m = (meaning that the solutions with m = in Figs. 2a and b are stable); the curves with 
j3 = 0.5 and (3 = 1.2 have two stable intersections, one at m = and one at large m (and thus 
the solutions at m = in Fig. 2c are stable, those at intermediate m are unstable, and those 
with large m are again stable). 

Although we see bistability, the firing rate for the retrieval state is unrealistically high - 
on the order of 100 Hz, near saturation. As discussed in the main text, we can reduce the 
firing rate by increasing a. This is done in Fig. Ill-lb, where we plot 'l'(m) versus m but this 
time for a = 0.05 and (3 = 1.2. Again there are three intersections (corresponding to the three 
intersections between the m-nullcline with (5 = 1.2 and the /j£:-nullcline with a = 0.05 in Fig. 
2c). With this higher value of a, the upper intersection is now in a biologically realistic range. 
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Appendix IV: Retrieval states in the finite connectivity regime 



When we performed network simulations, we found that the memory strength, (3, did not exhibit 
exactly the predicted 1/K scaling. Here we ask whether the departure from predictions that we 
observed can be explained by finite K corrections. These corrections, as we will see shortly, are 
on the order of K^ 1 ! 2 . Since in our simulations K is as small as 1,500, these corrections are 
potentially large. 

Our starting point is the exact set of reduced mean field equations, which is found by 
combining Eqs. (19) and (20), 



v m + K'^D' 1 [J n h E - J E ihi] = F E (h E ) + am 

m = F E {h E + j3m) - F E (h E ) 
v& + K-^D- 1 [JeeJh - J IE h E ] = F/(/i/) . 

When K is large we can solve these equations by perturbing around the K — > oo solutions, 
which we denote h E $, mo and hio (these are the solutions to Eq. (22)). The zeroth step in this 
perturbation analysis is to replace h E and hi by h E o and hio where they appear in brackets 
(and thus multiply K~ x l 2 ). This gives us a new set of equations, 



"so + 5is E = F E (h E ) + am 

m = F E (h E + 13m) - F E (h E ) 
i/jq + 5v, = Fj(hi) 



(IV-2a) 
(IV-2b) 
(IV-2c) 



where 



6u E = K-^D- 1 [J n h E0 - J EI h I0 ] 
5u E = K^^D- 1 [J EE h I0 - Ji E h E0 ] 



(IV-3a) 
(IV-3b) 
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For the inhibitory firing rate, it is easy to see the effect of finite K: hi is shifted relative to 
hio by an amount proportional to Sv^ Only slightly more difficult are h E and m, for which we 
have to consider how 5v E affects the nullclines. Fortunately, only the /iE-nullcline is affected, 
and we see that it shifts in a direction given by the sign of 5v E . In particular, 

d(-h E ) -1 



dSv E F'(h 



(IV-4) 



(We consider -h E since, by convention we plot our nullclines in a space with -h E on the y-axis.) 
Thus, if bv E is positive then the /ig-nullcline shifts down relative to h E o, while if it is negative 
the nullcline shifts up. 

In our simulations we set (3 to (3 m i n , the minimum value of (3 that allows retrieval of one 
memory To determine how K affects (3 m i n , then, we need to know how to adjust (3 so that we 
keep the grazing intersection as K changes. Fortunately, the /i^-nullcline depends on K but 
not (3, and the m-nullcline depends on (3 but not K. Thus, all we need to know is how the 
m- nullcline changes with (3. Using Eq. (IV-2b), it is easy to show that at fixed m, 

d{-h E ) = mF'{h E + (3m) 
d(3 F'(h E + 13m) - F'(h E ) ' 

The numerator in this expression is clearly positive and, for equilibria to the left of the peak of 
the m-nullcline, the denominator is also positive (see Appendix III). Thus, increasing (3 causes 
the m-nullcline to move up. 

Combining Eqs. (IV-4) and (IV-5), we have the following picture, 



8v E decreases => h E — nullcline moves up => (3 m i n increases 

where "up" corresponds to movement in the m — (—h E ) plane. To complete the picture, we need 
to know how 8u E depends on K. From Eq. (IV-3), we see that 5v E oc K~ l l 2 [Juh E o — J E ihro] = 
K~ l l 2 [— | Jn\h E o + \J E i\hio\. Thus, whether bv E is an increasing or decreasing function of K 
depends on whether \Jn\h E o is larger or smaller than \J E i\hjo- However, as we have seen, 
typically h E is negative. Thus, we expect 5v E to be proportional to K~ 1 / 2 with a positive 
constant of proportionality, which means that 5v E is a decreasing function of K. Combining 
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Figure IV- 1: The effect of finite K corrections on /3 m j n . The minimum value of f3 at which a single 
stored pattern can be retrieved successfully, (3 m in, decreases as K decreases. The parameters 
are the same as Fig. 2 with a = 0.05. 

that with the above picture, we conclude that when K increases (3 m in also increases. This is 
shown explicitly in Fig. IV- 1. Moreover, it was exactly what we saw in our simulations: f3 m i n {(3 
in Table I) was larger than predicted when we increased K (compare (3 with (3 pre dicted m Table 
I)- 

Appendix V: Fine tuning in the learning rule 

In the model described here, the structured part of the synaptic weights scale as K -1 whereas 
the background scales as K~ 1 / 2 . This appears to require fine tuning, since adjustments to the 
weights during learning of the attractors have to be a factor of K 1 / 2 times smaller than the 
background weights; a factor that can be as high as 100. 

The first question to ask, then, is: exactly how big is the fine tuning problem? In other words, 
how much noise can we add to the learning rule without having a huge effect on the storage 
capacity? This can be answered by considering a learning rule in which the weight changes during 
learning a pattern are not quite perfect. Specifically, let us consider the following modification 
ofEq. (4), 
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where the r^- are zero- mean, uncorrelated random variables with variance cr 2 . The additional 
noise in this learning rule increases the variance of the quenched noise by an amount K E pa 2 . As 
a result, if 



a 2 ~ O {{K EP y l ) , (V-2) 

the effect on storage capacity is an 0(1) increase in the quenched noise, and thus the storage 
capacity still scales as K E . 

With the scaling in Eq. (V-2), weight changes during learning of each pattern is a factor of 
pi/2 sma n er than the background weights, and therefore the amount of fine tuning depends on 
how many patterns are stored. Because of the low storage capacity found in these networks (at 
most 2.5% [23]), even when K is as large as 10, 000, p -1 / 2 is on the order of 6%. 

We should also point out that it is possible for the weight changes associated with the 
structured part of the connectivity to be on the same order as the background, although at the 
expense of storage capacity. Let us consider a third learning rule in which each synapse has a 
probability q of changing its value during learning, 



where the q^- are Bernoulli variables; q^ = 1 with probability q and with probability 1 — q. 
Let us define the coupling strength slightly differently than in Eq. (10), 

= §. , 

a(l — a)qK E 

where, as usual, f3 ~ 0(1). With this definition, the mean memory strength, {/3'qfj), is again 
P/Kecl(1 — a), as in Eq. (10). But by setting q ~ 0(K B -1 / 2 ), the synaptic weight change - if 
there is one - is 0(if B _1//2 ), just as it is for the background weights. However, there is a major 
drawback: as is easy to show, the variance associated with the structured part of the connectivity 
increases by a factor of K E , so the maximum number of patterns scales as p max ~ \[Kb rather 
than K E . We thus use Eq. (4) for A{j in all of our analysis. 
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